Communication Avoiding (CA) and Other Innovative Algorithms

نویسندگان

  • James Demmel
  • Kathy Yelick
چکیده

In 1981 Hong and Kung proved a lower bound on the amount of communication (amount of data moved between a small, fast memory and large, slow memory) needed to perform dense, n-by-n matrix-multiplication using the conventional O(n) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it to the parallel case (where communication means the amount of data moved between processors). In both cases the lower bound may be expressed as Ω(#arithmetic operations / √ M ), where M is the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization, LDL factorization, QR factorization, Gram–Schmidt algorithm, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower bounds on the amount of data moved (bandwidth-cost), we get lower bounds on the number of messages required to move it (latency-cost). We extend our lower bound technique to compositions of linear algebra operations (like computing powers of a matrix), to decide whether it is enough to call a sequence of simpler optimal algorithms (like matrix multiplication) to minimize communication, or if we can do better. We give examples of both. We also show how to extend our lower bounds to certain graph theoretic problems. We point out recently designed algorithms that attain many of these lower bounds.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Deflation Technique for the Communication- Avoiding Conjugate Gradient Method

Communication-avoiding Krylov subspace methods (CA-KSMs) fuse s loop iterations in order to asymptotically reduce sequential and parallel communication costs by a factor of O(s). However, the actual savings depend on the nonzero structure of the system matrix A, and these savings typically diminish as A fills, as is common when preconditioning. Recent efforts target incorporating preconditionin...

متن کامل

Solving the Ride-Sharing Problem with Non-Homogeneous Vehicles by Using an Improved Genetic Algorithm with Innovative Mutation Operators and Local Search Methods

An increase in the number of vehicles in cities leads to several problems, including air pollution, noise pollution, and congestion. To overcome these problems, we need to use new urban management methods, such as using intelligent transportation systems like ride-sharing systems. The purpose of this study is to create and implement an improved genetic algorithms model for ride-sharing with non...

متن کامل

CA-SVM: Communication-Avoiding Support Vector Machines on Clusters

We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel isoefficie...

متن کامل

CA-SVM: Communication-Avoiding Parallel Support Vector Machines on Distributed Systems

We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel isoefficie...

متن کامل

A modified elite ACO based avoiding premature convergence for travelling salesmen problem

The Travelling Salesmen Problem (TSP) is one of the most important and famous combinational optimization problems that aim to find the shortest tour. In this problem, the salesman starts to move from an arbitrary place called depot and after visiting all nodes, finally comes back to depot. Solving this problem seems hard because program statement is simple and leads this problem belonging to NP...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013